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1. Introduction 

It is a well-known and celebrated result that the uniform distribution on a finite set can be charac- 
terized as having maximal entropy. Jaynes used this idea as a foundation of statistical mechanics [1], 
and the Maximum Entropy Principle has become a popular principle for statistical inference [2-8]. Of- 
ten it is used as a method to get prior distributions. On a finite set, for any distributions P we have 
H(P) = H{U) — D{P\\U) where H is the Shannon entropy, D is information divergence, and U is 
the uniform distribution. Thus, maximizing H{P) is equivalent to minimizing D{P\\U). Minimization 



Version March 29, 2009 submitted to Entropy 



2 of 16 



of information divergence can be justified by the conditional limit theorem by Csiszar [9, Theorem 4]. 
So if we have a good reason to use the uniform distribution as prior distribution we automatically get a 
justification of the Maximum Entropy Principle. The conditional limit theorem cannot justify the use of 
the uniform distribution itself, so we need something else. Here we shall focus on symmetry. 

Example 1. A die has six sides that can be permuted via rotations of the die. We note that not all 
permutations can be realized as rotations and not all rotations will give permutations. Let G be the 
group of permutations that can be realized as rotations. We shall consider G as the symmetry group of 
the die and observe that the uniform distribution on the six sides is the only distribution that is invariant 
under the action of the symmetry group G. 

Example 2. G = ]R/27rZ is a commutative group that can be identified with the group SO (2) of rota- 
tions in 2 dimensions. This is the simplest example of a group that is compact but not finite. 

For an object with symmetries the symmetry group defines a group action on the object, and any 
group action on an object defines a symmetry group of the object. A special case of a group action of 
the group G is left translation of the elements in G. Instead of studying distributions on objects with 
symmetries, in this paper we shall focus on distributions on the symmetry groups themselves. It is no 
serious restriction because a distribution on the symmetry group of an object will induce a distribution 
on the object itself. 

Convergence of convolutions of probability measures were studied by Stromberg [10] who proved 
weak convergence of convolutions of probability measures. An information theoretic approach was 
introduced by Csiszar [11]. Classical methods involving characteristic functions have been used to give 
conditions for uniform convergence of the densities of convolutions [12]. See [13] for a review of the 
subject and further references. 

Finally it is shown that convergence in information divergence corresponds to uniform convergence 
of the rate distortion function and that weak convergence corresponds to pointwise convergence of the 
rate distortion function. In this paper we shall mainly consider convolutions as Markov chains. This will 
give us a tool, which allows us to prove convergence of iid. convolutions, and the rate of convergence is 
proved to be exponential. 

The rest of the paper is organized as follows. In Section 2. we establish a number of simple results on 
distortion functions on compact set. These results will be used in Section 4.. In Section 3. we define the 
uniform distribution on a compact group as the uniquely determined Haar probability measures. In Sec- 
tion 4. it is shown that the uniform distribution is the maximum entropy distribution on a compact group 
in the sense that it maximizes the rate distortion function at any positive distortion level. Convergence 
of convolutions of a distribution to the uniform distribution is established in Section 5. using Markov 
chain techniques, and the rate of convergence is discussed in Section 6.. The group SO (2) is used as 
our running example. We finish with a short discussion. 

2. Distortion on compact groups 

Let G be a compact group where * denotes the composition. The neutral element will be denoted e 
and the inverse of the element g will be denoted g^^. 



Version March 29, 2009 submitted to Entropy 



3 of 16 




y 



Figure 1. Squared Euclidean distance between the rotation angles x and y. 



We shall start with some general comments on distortion functions on compact sets. Assume that 
the group both plays the role as source alphabet and reproduction alphabet. A distortion function d : 
G X G ^ M is given and we will assume that d (x, y) > with equality if and only if x = y. We will 
also assume that the distortion function is continuous. 

Example 3. As distortion function on SO (2) we use the squared Euclidean distance between the corre- 
sponding points on the unit circle, i.e. 



This illustrated in Figure 1. 

The distortion function might be a metric but even if the distortion function is not a metric, the 
relation between the distortion function and the topology is the same as if it was a metric. One way 
of constructing a distortion function on a group is to use the squared Hilbert-Smidt norm in a unitary 
representation of the group. 

Theorem 4.IfC is a compact set and d : C x C ^ M. is a non-negative continuous distortion function 
such that d (x, y) = if and only if x = y, then the topology on C is generated by the distortion balls 
{x & C \ d{x,y) < r} where y E C and r > 0. 

Proof. We have to prove that a subset B C C is open if and only if for any y E B there exists a ball that 
is a subset of B and contains y. Assume that B C C is open and that y E B. Then Ci? compact. Hence, 
the function x d{x,y) has a minimum r on C-B and r must be positive because r = d{x,y) = 
would imply that x = y E B. Therefore {x E C \ d {x,y) < r} C B. 

Continuity of d implies that the balls {x E C \ d{x,y) < r} are open. If any point in B is contained 
in an open ball, then i? is a union of open set and open. □ 

The following theorem may be considered as a kind of uniform continuity of the distortion function 
or as a substitute for the triangular inequality when d is not a metric. 




2 — 2 cos {x — y) . 
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Lemma S.IfC is a compact set and d : C x C ^ is a non-negative continuous distortion function 
such that d (x, y) = if and only ifx = y, then there exists a continuous function fi satisfying fi (0) = 
such that 

\d{x,y) -d{z,y)\< fi {d{z,y)) forx,y,z G C. (1) 

Proof. Assume that the theorem does not hold. Then there exists e > and a net (xx, yx, zx)xi=a such 
that 

d{xx,yx) -d{zx,yx) > e 

and d {zx, yx) — > 0. A net in a compact set has a convergent subnet so without loss of generality we 
may assume that the net {xx,yx, zx)x^a converges to some triple (xoo, Z/oo, -^oo) • By continuity of the 
distortion function we get 

d (xoo, ?/oo) - d {z^, ?/oo) > e 

and d (zoo, Z/oo) = 0, which implies z^ = y^o and we have a contradiction. □ 

We note that if a distortion function satisfies (1) then it defines a topology in which the distortion balls 
are open. 

In order to define the weak topology on probability distributions we extend the distortion function 
from C X C to Mj (C) x (C) via 

d{P,Q) = ME[d{X,Y)] , 

where X and Y are random variables with values in C and the infimum is taken all joint distributions 
on (X, Y) such that the marginal distribution of X is P and the marginal distribution of y is The 
distortion function is continuous so (x, y) ^ d {x, y) has a maximum that we denote cimax- 

Theorem 6.IfG is a compact set and d : C x C ^ is a non-negative continuous distortion function 
such that d {x, y) = if and only ifx = y, then 

\d (P, Q)-d {S, Q)\<f2 (d (5, P)) for P,Q,Se Ml (C) 

for some continuous function f2 satisfying f2 (0) = 0. 

Proof. According to Lemma 5 there exists a function /i satisfying (1). We use that 

E [\d (X, Y)-d {Z, Y)\]<E [/i (d (Z, X))] 

= E [/i {d {Z, X))\d{Z,X)<6]-P {d {Z, X) < 6) 
+ E[fi{d{Z,X)) \d{Z,X) > 6]-P{d{Z,X) > 6) 



< /i (5) + /i 



5 

d{S,P) 



6 

1 /2 

This hold for all 5 > and in particular for 6 = {d {S, P)) ' , which proves the theorem. □ 
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The theorem can be used to construct the weak topology on M\_ (C) with 

{PeMl{C)\diP,Q)<r}, 

P E Ml_ (C) , r > as open balls that generate the topology. We note without proof that this definition 
is equivalent with the quite different definition of weak topology that one will find in most textbooks. 

For a group G we assume that the distortion function is right invariant in the sense that for all x,y, z E 
G a distortion function d satisfies 

d{x * z,y * z) = d{x,y) . 

A right invariant distortion function satisfies d(x,y) = d(x * y~^,e), so right invariant continuous 
distortion functions of a group can be constructed from non-negative functions with a minimum in e. 

3. The Haar measure 

We use * to denote convolution of probability measures on G. For g E G we shall use g * P to 
denote the (^-translation of the measure P or, equivalently, the convolution with a measure concentrated 
in g. The n-fold convolution of a distribution P with itself will be denoted P*". For random variables 
with values in G one can formulate an analog of the central limit theorem. We recall some facts about 
probability measures on compact groups and their Haar measures. 

Definition 7. Let G be a group. A measure P is said to be a left Haar measure if g * P = P for any 
g E G. Similarly, P is said to be a right Haar measure if P * g = P for any g E G. A measure is said to 
be a Haar measure if it is both a left Haar measure and a right Haar measure. 

Example 8. The uniform distribution on SO (2) or ]R/27rZ has density Xjl-n with respect to the Lebesgue 
measure on [0; 27r[ . The function 

oo 

f (x) = 1 + ^anCos{n{x + (pn)) (2) 

n=l 

is a density on a probability distribution P on SO (2) if the Fourier coefficients an are sufficiently small 
so that f is non-negative. A sufficient condition for f to be non-negative is that Yl'^=i \^n\ < 1- 
Translation by y gives a distribution with density 

oo 

/ (x - y) = 1 + ^ an COS (n (x - ?/ + 0^)) • 

n=l 

The distribution P is invariant if and only if f is 1 or, equivalently, all Fourier coefficients (an)„gpj are 
0. 

A measure P on G is said to have full support if the support of P is G, i.e. P (A) > for any 
non-empty open set A C G. The following theorem is well-known [14-16]. 

Theorem 9. Let U be a probability measure on the compact group G. Then the following four conditions 
are equivalent. 
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• U is a left Haar measure. 

• U is a right Haar measure. 

• U has full support and is idempotent in the sense that U * U = U. 

• There exists a probability measure P on G with full support such that P * U = U. 

• There exists a probability measure P on G with full support such that U * P = U. 
In particular a Haar probability measure is unique. 

In [14-16] one can find the proof that any locally compact group has a Haar measure. The unique 
Haar probability measure on a compact group will be called the uniform distribution and denoted U. 
For probability measures P and Q the information divergence from P to Q is defined by 



We shall often calculate the divergence from a distribution to the uniform distribution U, and introduce 
the notation 



For a random variable X with values in G we will sometimes write D {X\\U) instead of D {P\\U) when 
X has distribution P. 

Example 10. The distribution P with density f given by (2) has 



Let G be a compact group with uniform distribution U and let F be a closed subgroup of G. Then the 
subgroup has a Haar probability measure U p and 



where [G : F] denotes the index of F in G. In particular D {U p) is finite if and only if [G : F] is finite. 

4. The rate distortion theory 

We will develop aspects of the rate distortion theory of a compact group G. Let P be a probability 
measure on G. We observe that compactness of G implies that a covering of G by distortion balls of 
radius S > contains a finite covering. If k is the number of balls in a finite covering then Rp (S) < 




D{P) = D{P\\U). 



D{P) 




n=l 



D{UF) = \og{[G:F]) 



(3) 
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log {k) where Rp is the rate distortion function of the probability measure P. In particular the rate 
distortion function is upper bounded. The entropy of a probability distribution P is given by H (P) = 
Rp (0). If the group is finite then the uniform distribution maximizes the Shannon entropy Rp (0) but 
if the group is not finite then in principle there is no entropy maximizer. As we shall see the uniform 
distribution still plays the role of entropy maximizer in the sense that the uniform distribution maximize 
the value Rp (6) of the rate distortion function for any positive distortion level S > 0. The rate distortion 
function Rp can be studied using its convex conjugate R*p given by 

R*P = sup (3- 6 -Rp (6) . 

5 

The rate distortion function is then recovered by the formula 

Rp (6) = snp(3-6-R*p (/?) . 

The techniques are pretty standard [17]. 

Theorem 11. The rate distortion function of the uniform distribution is given by 

Rl{f5) = \og{Z{P)) 
where Z is the partition function defined by 

Z{(3)= [ exp{P-d{g,e)) dUg. 
Jg 

The rate distortion function of an arbitrary distribution P satisfies 

Ru- D {P\\U) <Rp< Ru- (4) 

Proof. First we prove a Shannon type lower bound on the rate distortion function of an arbitrary distri- 
bution P on the group. Let X be a random variable with values in G and distribution P, and let X be a 



d X,X 



random variable coupled with X such that the mean distortion E 

l(X;X^ =d(^X\\U |x) -D{X\\U) 

= d(^x* x-^\\u \X^ -D {X\\U) 

>d(x* X-^\\u) - D iX\\U) . 



equals 5. Then 



(5) 
(6) 
(7) 



Now, E 



d X,X 



E 



d X*X-\e 



and 



D (x*X-^\\U^ >D{Pp\\U) 



where Pp is the distribution that maximizes divergence under the constraint E [d {Y, e)] = 5 when Y has 
distribution P^j. The distribution Pp is given by the density 
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where (3 is determined by the condition 5 = Z' {13) / Z {(3) . 

If P is uniform then a joint distribution is obtained by choosing X uniformly distributed, and choos- 
ing Y distributed according to Pp and independent of X. Then X = Y * X h distributed according to 
Pf3 * U = U, and we have equality in (7). Hence the rate determined the lower bound (7) is achievable 
for the uniform distribution, which prove the first part of the theorem, and the left inequality in (4). 
The joint distribution on (^X, X^ that achieved the rate distortion function when X has a uniform dis- 
tribution, defines a Markov kernel ^ : X ^ X that is invariant under translations in the group. For 
any distribution P the joint distribution on [X,X] determined by P and \l/ gives an achievable pair of 



distortion, and rate that is on the rate distortion curve of the uniform distribution. This proves the right 
inequality in Equation (4). □ 

Example 12. For the group SO (2) the rate distortion Junction can be parametrized using the modified 
Bessel functions Ij,j G Nq. The partition function is given by 



Z{I3)= [ exp{P-d{g,e)) dUg 
Jg 

1 z"^'' 

= — / exp (/5 ■ (2 — 2 cosx)) dx 
27r Jo 

1 r 

= exp (2/?) ■ — / exp (—2/? ■ cosx) dx 

Jo 



= exp (2/3) -Jo (-2/3). 

Hence Rfj (/?) = log {Z {j3)) = 2j3 + log (Jq (—2/3)). The distortion S corresponding to j3 is given by 



'/o(-2/3) 



and the corresponding rate is 



Ru{S) = P-S-(2P + \og(I„(-2P))) 

These joint values of distortion and rate can be plotted with P as parameter as illustrated in Figure 2. 

The minimal rate of the uniform distribution is achieved when X and X are independent. In this 

= J^d (x, e) dPx. This distortion level will be called the critical 



d X,X 



case the distortion is E 

distortion and will be denoted dcrit- On the interval ]0; dcrit] the rate distortion function is decreasing and 
the distortion rate function is the inverse Rj,^ of the rate distortion function Rp on this interval. The 
distortion rate function satisfies: 

Theorem 13. The distortion rate function of an arbitrary distribution P satisfies 

R^' (6) - /2 {d (P, U)) < Rp' (6) < R^' (6) for 6 < d,r^t (8) 
for some increasing continuous function f2 satisfying f2 (0) = 0. 
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rate A 




Figure 2. The rate distortion region of the uniform distribution on SO (2) is shaded. The rate distortion 
function is the lower bounding curve. In the figure the rate is measured in nats. The critical distortion 
dcrit equals 2, and the dashed line indicates (imax = 4. 

Proof. The right hand side follows because Ru is decreasing in the interval [0; dcrit] Let X be a random 
variable with distribution P and let F be a random variable coupled with X. Let Z he a random variable 
coupled with X such that E [d {X, Z)] = d (P, U) . The couplings between X and Y, and between X 
and Z can be extended to a joint distribution on X, Y and Z such that Y and Z are independent given 
X. For this joint distribution we have 

I{Z-Y)<I{X,Y) 

and 

\E [d {Z, Y)] -E[d (X, Y)] \<f2{d (P, U)) . 

We have to prove that 

E [d (X, Y)] > R^' (/ (X, Y)) - h {d (P, U)) 
but / {Z; Y) < I (X, Y) so it is sufficient to prove that 

E [d (X, Y)] > R^' (/ (Z, Y)) - h {d (P, U)) 

and this follows because E [d {Z, Y)] > R^^ (/ {Z, Y)) . □ 

5. Convergence of convolutions 

We shall prove that under certain conditions the n-fold convolutions P*" converge to the uniform 
distribution. 
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Example 14. The function 

oo 

/ (a;) = 1 + ^ a„ cos {n {x + 0„)) 

n=l 

Z5 a density on a probability distribution P on G if the Fourier coefficients an are sufficiently small. If 
(on) and ipn) are Fourier coefficients of P and Q then the convolution has density 



Y /.27r / oo \ / oo \ 

— j f 1 + ^flncosn (x - ?/ + 0„) j ( 1 + ^6„cosn + V^n) j dy 

= 1 + — / anK cosn{x-y + 0„) cosn {y + ipn) dy 

n=\ ''^ 

= 1 + — ^ / anK COS {n (x + (/)„ + - ray) cos (ray) dy 

^ n=l 



cos ra (x + 0„ + ■?/'„) cos (ray) 
+ sin (n (x + 0„ + V'n)) sin (ray) 



cos (ny) dy 



>^a„6„cos(ra(x + 0„ + V^„)) Z'^'' 2/ n > 
1 + 2^ ^ / cos (ny) dy 



27r 

n=l 







oo 



COS (ra (x + 0„ + V^„)) 



n=l 



Therefore the n-fold convolution has density 

^ 2n-i =1 + E(y) 2cos(A;(x + n0,)). 

A:=l A;=l 

Therefore each of the Fourier coefficients is exponentially decreasing. 



Clearly, if P is uniform on a proper subgroup then convergence does not hold. In several papers on 
this topic [13, 18, and references therein] it is claimed and "proved" that if convergence does not hold 
then the support of P is contained in the coset of a proper normal subgroup. The proofs therefore contain 
errors that seem to have been copied from paper to paper. To avoid this problem and make this paper 
more self-contained we shall reformulate and reprove some already known theorems. 
In the theory of finite Markov chains is well-known that there exists an invariant probability measure. 
Certain Markov chains exhibits periodic behavior where a certain distribution is repeated after a number 
of transitions. All distributions in such a cycle will lie at a fixed distance from any (fixed) measure, where 
the distance is given by information divergence or total variation (or any other Csiszar /-divergence). 
It is also well-known that finite Markov chains without periodic behavior are convergent. In general a 
Markov chain will converge to a "cyclic" behavior as stated in the following theorem [19]. 

Theorem 15. Let ^ be a transition operator on a state space A with an invariant probability measure 
Qin- If D {S W Q) < oo then there exists a probability measure P* such that D ($"5 || $"Q) — > and 
D ($"(5 II Qin) is constant. 
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We shall also use the following proposition that has a purely computational proof [20]. 
Proposition 16. Let Px,x & X be distributions and let Q be a probability distribution on X. Then 

j D (P, II Q) dQx = D(^j PJQx \\Q^+ J D (^P, || J Pt dQ?j dQx. 

We denote the set of probability measures on G by Ml_ (G). 

Theorem 17. Let P be a distribution on a compact group G and assume that the support of P is not 
contained in any nontrivial coset of a subgroup of G . Then, if D {S\\U) is finite then D (P*" * ^Hf/) -^0 
for n ^ oo. 

Proof Let ^ : G ^ Afj (G) denote the Markov kernel ^ (^) = P * ^. Then P*" * ^ = ^'^ {P * S) . 
Thus there exists a probability measure Q on G such that D (\1'" (P) ||\E'" (Q)) ^ for n — > oo and 
such that D {"^"^ (Q)) is constant. We shall prove that Q = U. 
First we note that 

D{Q) = DiP*Q) 

{D{g*Q)-D{g*Q\\P*Q)) dPg 

= D{Q)- [ D{g*Q\\P*Q) dPg . 
Jg 

Therefore g * Q = P * Q for P almost every g E G. Thus there exists at least one go E G such that 

g^^Q = P^Q. Then Q = P*Q where P = g^U P. 

Let ^ : G ^ M| (G) denote the Markov kernel g ^ P * g.Fut 

i=l 1=1 

According to [19] this ergodic mean will converge to a distribution T such that ^ (T) = T so that 
T * P = T. Hence we also have that T * T = T, i.e. T is idempotent and therefore supported by a 
subgroup of G. We know that P is not contained in any nontrivial subgroup of G so the support of T 
must be G. We also get Q = T * Q, which together with Theorem 9 implies that Q = U. □ 

by choosing S = P we get the following corollary. 

Corollary 18. Let P be a probability measure on the compact group G with Haar probability measure 
U. Assume that the support of P is not contained in any coset of a proper subgroup ofG and D {P\\U) 
is finite. Then D (P™||f/) ^ Oforn-^ oo. 

Corollary 1 8 together with Theorem 1 1 implies the following result. 

Corollary 19. Let P be a probability measure on the compact group G with Haar probability measure 
U. Assume that the support of P is not contained in any coset of a proper subgroup ofG and D {P\\U) 
is finite. Then the rate distortion function of P*" converges uniformly to the rate distortion function of 
the uniform distribution. 
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We also get weak versions of these results. 

Corollary 20. Let P be a probability measure on the compact group G with Haar probability measure 
U. Assume that the support of P is not contained in any coset of a proper subgroup of G. Then P*" 
converges to U in the weak topology, i.e. d (P*", U) Ofor n — > oo. 

Proof If we take S = Pp then D (Pp) is finite and D (P*" * Pp\\U) ^ for n ^ cx). We have 

d{P*^*Pp,U) < d^,,\\P*^*Pp-U\\ 

< d^^^{2D{P*"*Pp\\U)f^ 

implying that d (P*" * P^, f/) ^ forn ^ oo. Now 

|rf(P*",[/) -rf(P*"*P;3,[/)| < /2(rf(P*"*P^,P*")) 

< /2(rf(P^,e)). 

Therefore lim„_»oo sup d (P*", U) < f2 {d {Pp, e)) for all (3, which implies that 

lim supc/(P*",t/) = 0. □ 

n— >oo 

Corollary 21. Let P be a probability measure on the compact group G with Haar probability measure 
U. Assume that the support of P is not contained in any coset of a proper subgroup ofG and D {P\\U) 
is finite. Then Rp*n converges to Ru pointwise on the interval ]0; d^i^^[for n oo. 

Proof. Corollary 20 together with Theorem 13 implies uniform convergence of the distortion rate func- 
tion for distortion less than dent- This implies pointwise convergence of the rate distortion function on 
]0; (icrit[ because rate distortion functions are convex functions. The same argument works in the interval 
]dcrit', dmaxi ■ Pointwisc convergence in dcru must also hold because of continuity. □ 

6. Rate of convergence 

Normally the rate of convergence will be exponential. If the density is lower bounded this is well- 
known. We bring a simplified proof of this. 

Lemma 22. Let P be a probability distribution on the compact group G with Haar probability measure 
U. IfdP/dU > c > and D (P) is finite, then 

D{P") < (1 - cY'-^ D (P) . 

Proof. First we write 

P = {l-c)-S + c-U 
where S denotes the probability measure 
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For any distribution Q on G we have 

D {Q * P) = D {{I - c) ■ Q * S + c ■ Q *U) 

<{l-c)-D{Q*S) + c-D{Q*U) 
<{l-c)-D{Q) + c-D{U) 
= {l-c)-D{Q). 

Here we have used convexity of divergence. □ 
If a distribution P has support in a proper subgroup F then 

D{P)>D [Up) 

= log([G:F]) 
> log (2) = 1 bit. 

Therefore D (P) < 1 bit implies that P cannot be supported by a proper subgroup, but it implies more. 

Proposition 23. If P is a distribution on the compact group G and D (P) < 1 bit then '^'•^^^^ is lower 
bounded by a positive constant. 

Proof. The condition D (P) < 1 bit imphes that t/{^>0}>l/2. Hence there exists £ > such that 
U {%> e]> 1/2. We have 



dU ndU^ ' dU 



-I £^ ■ ^ (y - a;) <if/a; 



> e ■ I e dUx 

J{j^{x)>e}n{^{y-x)>e} 



dU^ ' j [dU 
Using the inclusion-exclusion inequalities we get 

U ( \^ (x) > e] n \ ^ (y - x) > e 



[dU^ ' j [dU 
dP 1 



rr ,dP , , 1 rrfdP , , 1 rrffdP,, 1 ( dP ^ 



, rfP , , 
>2.Ui-(.)>e 



Hence 

for all yeG. □ 
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Combining Theorem 17, Lemma 22, and Proposition 23 we get the following result. 

Theorem 24. Let P be a probability measure on a compact group G with Haar probability measure U. 
If the support of P is not contained in any coset of a proper subgroup ofG and D (P|| U) is finite then 
the rate of convergence of D (P*"|| U) to zero is exponential. 

As a corollary we get the following result that was first proved by Kloss [21] for total variation. 

Corollary 25. Let P be a probability measure on the compact group G with Haar probability measure 
U. If the support ofP is not contained in any coset of a proper subgroup ofG and D {P\\U) is finite then 
P*" converges to U in variation and the rate of convergence is exponential. 

Proof. This follows directly from Pinsker's inequality [22, 23] 

^l|p*"-t/f < p)(p*"i|[/). □ 

Corollary 26. Let P be a probability measure on the compact group G with Haar probability measure 
U. If the support of P is not contained in any coset of a proper subgroup ofG and D {P\\U) is finite, 
then the density 



dU 

converges to 1 point wise almost surely for n tending to infinity. 
Proof. The variation norm can be written as 

rfP™ 



IIP*" - U 
Thus 



1 



dU. 



U 



dP*"" ^ 
dU ~ 



>e < 



dU 

p*n _ f/l 



e 



The result follows by the exponential rate of convergence of P*" to U in total variation combined with 
the Borel-Cantelli Lemma. □ 

7. Discussion 



In this paper we have assumed the existence of the Haar measure by referring to the literature. With 
the Haar measure we have then proved convergence of convolutions using Markov chain techniques. The 
Markov chain approach can also be used to prove the existence of the Haar measure by simply referring 
to the fact that a homogenous Markov chain on a compact set has an invariant distribution. The problem 
about this approach is that the proof that a Markov chain on a compact set has an invariant distribution 
is not easier than the proof of the existence of the Haar measure and is less known. 

We have shown that the Haar probability measure maximizes the rate distortion function at any dis- 
tortion level. The normal proofs of the existence of the Haar measure use a kind of covering argument 
that is very close to the techniques found in rate distortion technique. There is a chance that one can get 
an information theoretic proof of the existence of the Haar measure. It seems obvious to use concavity 
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arguments as one would do for Shannon entropy but, as proved by Ahlswede [24], the rate distortion 
function at a given distortion level is not a concave function of the underlying distribution, so some more 
refined technique is needed. 

As noted in the introduction for any algebraic structure A the group Aut {A) can be considered as 
symmetry group, it it has a compact subgroup for which the results of this paper applies. It would be 
interesting to extend the information theoretic approach to the algebraic object A itself, but in general 
there is no known equivalent to the Haar measure for other algebraic structures. Algebraic structures are 
used extensively in channel coding theory and cryptography so although the theory may become more 
involved extensions of the result presented in this paper are definitely worthwhile. 
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